AITopics | omic data

Collaborating Authors

omic data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Biology-informed neural networks learn nonlinear representations from omics data to improve genomic prediction and interpretability

Kontolati, Katiana, Gladstone, Rini Jasmine, Davis, Ian, Pickering, Ethan

arXiv.org Artificial IntelligenceOct-17-2025

We extend biologically-informed neural networks (BINNs) for genomic prediction (GP) and selection (GS) in crops by integrating thousands of single-nucleotide polymorphisms (SNPs) with multi-omics measurements and prior biological knowledge. Traditional genotype-to-phenotype (G2P) models depend heavily on direct mappings that achieve only modest accuracy, forcing breeders to conduct large, costly field trials to maintain or marginally improve genetic gain. Models that incorporate intermediate molecular phenotypes such as gene expression can achieve higher predictive fit, but they remain impractical for GS since such data are unavailable at deployment or design time. BINNs overcome this limitation by encoding pathway-level inductive biases and leveraging multi-omics data only during training, while using genotype data alone during inference. Applied to maize gene-expression and multi-environment field-trial data, BINN improves rank-correlation accuracy by up to 56% within and across subpopulations under sparse-data conditions and nonlinearly identifies genes that GWAS/TWAS fail to uncover. With complete domain knowledge for a synthetic metabolomics benchmark, BINN reduces prediction error by 75% relative to conventional neural nets and correctly identifies the most important nonlinear pathway. Importantly, both cases show highly sensitive BINN latent variables correlate with the experimental quantities they represent, despite not being trained on them. This suggests BINNs learn biologically-relevant representations, nonlinear or linear, from genotype to phenotype. Together, BINNs establish a framework that leverages intermediate domain information to improve genomic prediction accuracy and reveal nonlinear biological relationships that can guide genomic selection, candidate gene selection, pathway enrichment, and gene-editing prioritization.

artificial intelligence, deep learning, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2510.1497

Country: North America > United States (0.46)

Genre: Research Report > Experimental Study (0.88)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

MS-ConTab: Multi-Scale Contrastive Learning of Mutation Signatures for Pan Cancer Representation and Stratification

Dou, Yifan, Khadre, Adam, Petreaca, Ruben C, Mirzaei, Golrokh

arXiv.org Artificial IntelligenceAug-29-2025

Motivation. Understanding the pan-cancer mutational landscape offers critical insights into the molecular mechanisms underlying tumorigenesis. While patient-level machine learning techniques have been widely employed to identify tumor subtypes, cohort-level clustering, where entire cancer types are grouped based on shared molecular features, has largely relied on classical statistical methods. Results. In this study, we introduce a novel unsupervised contrastive learning framework to cluster 43 cancer types based on coding mutation data derived from the COSMIC database. For each cancer type, we construct two complementary mutation signatures: a gene-level profile capturing nucleotide substitution patterns across the most frequently mutated genes, and a chromosome-level profile representing normalized substitution frequencies across chromosomes. These dual views are encoded using TabNet encoders and optimized via a multi-scale contrastive learning objective (NT-Xent loss) to learn unified cancer-type embeddings. We demonstrate that the resulting latent representations yield biologically meaningful clusters of cancer types, aligning with known mutational processes and tissue origins. Our work represents the first application of contrastive learning to cohort-level cancer clustering, offering a scalable and interpretable framework for mutation-driven cancer subtyping.

artificial intelligence, cancer type, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2508.19424

Country: North America > United States > Ohio (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Consistency of Feature Attribution in Deep Learning Architectures for Multi-Omics

Claborne, Daniel, Flores, Javier, Erwin, Samantha, Durell, Luke, Richardson, Rachel, Fore, Ruby, Bramer, Lisa

arXiv.org Machine LearningJul-31-2025

Machine and deep learning have grown in popularity and use in biological research over the last decade but still present challenges in interpretability of the fitted model. The development and use of metrics to determine features driving predictions and increase model i nterpretability continues to be an open area of research. We investigate the use of Shapley Additive Explanations (SHAP) on a multi - view deep learning model applied to multi - omics data for the purposes of identifying biomolecules of interest . Rankings of features via these attribution methods are compared across various architectures to evaluate consistency of the method. We perform multiple computational experiments to assess the robustness of SHAP and investigate modeling approaches and diagnostics to increase and measure the reliability of the identification of important features. Accuracy of a random - forest model fit on subsets of features selected as being most influential as well as clustering quality using o nly these features are used as a measure of enullectiveness of the attribution method. Our findings indicate that the rankings of features resulting from SHAP are sensitive to the choice of architecture as well as dinullerent random initializations of weights, suggesting caution when u sing attribution methods on multi - view deep learning models applied to multi - omics data. We present a n alternative, simple method to assess the robustness of identification of important biomolecules.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Machine Learning

2507.22877

Country:

North America > United States > California (0.04)
Europe > Czechia > Prague (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Machine learning-based multimodal prognostic models integrating pathology images and high-throughput omic data for overall survival prediction in cancer: a systematic review

Jennings, Charlotte, Broad, Andrew, Godson, Lucy, Clarke, Emily, Westhead, David, Treanor, Darren

arXiv.org Artificial IntelligenceJul-30-2025

Multimodal machine learning integrating histopathology and molecular data shows promise for cancer prognostication. We systematically reviewed studies combining whole slide images (WSIs) and high-throughput omics to predict overall survival. Searches of EMBASE, PubMed, and Cochrane CENTRAL (12/08/2024), plus citation screening, identified eligible studies. Data extraction used CHARMS; bias was assessed with PROBAST+AI; synthesis followed SWiM and PRISMA 2020. Protocol: PROSPERO (CRD42024594745). Forty-eight studies (all since 2017) across 19 cancer types met criteria; all used The Cancer Genome Atlas. Approaches included regularised Cox regression (n=4), classical ML (n=13), and deep learning (n=31). Reported c-indices ranged 0.550-0.857; multimodal models typically outperformed unimodal ones. However, all studies showed unclear/high bias, limited external validation, and little focus on clinical utility. Multimodal WSI-omics survival prediction is a fast-growing field with promising results but needs improved methodological rigor, broader datasets, and clinical evaluation. Funded by NPIC, Leeds Teaching Hospitals NHS Trust, UK (Project 104687), supported by UKRI Industrial Strategy Challenge Fund.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2507.16876

Country:

Europe > United Kingdom (1.00)
North America > United States (0.96)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
(3 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

An Uncertainty-Aware Dynamic Decision Framework for Progressive Multi-Omics Integration in Classification Tasks

Mu, Nan, Yang, Hongbo, Zhao, Chen

arXiv.org Artificial IntelligenceJul-3-2025

Background and Objective: High-throughput multi-omics technologies have proven invaluable for elucidating disease mechanisms and enabling early diagnosis. However, the high cost of multi-omics profiling imposes a significant economic burden, with over reliance on full omics data potentially leading to unnecessary resource consumption. To address these issues, we propose an uncertainty-aware, multi-view dynamic decision framework for omics data classification that aims to achieve high diagnostic accuracy while minimizing testing costs. Methodology: At the single-omics level, we refine the activation functions of neural networks to generate Dirichlet distribution parameters, utilizing subjective logic to quantify both the belief masses and uncertainty mass of classification results. Belief mass reflects the support of a specific omics modality for a disease class, while the uncertainty parameter captures limitations in data quality and model discriminability, providing a more trustworthy basis for decision-making. At the multi omics level, we employ a fusion strategy based on Dempster-Shafer theory to integrate heterogeneous modalities, leveraging their complementarity to boost diagnostic accuracy and robustness. A dynamic decision mechanism is then applied that omics data are incrementally introduced for each patient until either all data sources are utilized or the model confidence exceeds a predefined threshold, potentially before all data sources are utilized. Results and Conclusion: We evaluate our approach on four benchmark multi-omics datasets, ROSMAP, LGG, BRCA, and KIPAN. In three datasets, over 50% of cases achieved accurate classification using a single omics modality, effectively reducing redundant testing. Meanwhile, our method maintains diagnostic performance comparable to full-omics models and preserves essential biological insights.

artificial intelligence, machine learning, prediction, (16 more...)

arXiv.org Artificial Intelligence

2507.01032

Country: Asia > China (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(3 more...)

Add feedback

Graph Kolmogorov-Arnold Networks for Multi-Cancer Classification and Biomarker Identification, An Interpretable Multi-Omics Approach

Alharbi, Fadi, Budhiraja, Nishant, Vakanski, Aleksandar, Zhang, Boyu, Elbashir, Murtada K., Mohammed, Mohanad

arXiv.org Artificial IntelligenceMar-28-2025

The integration of multi-omics data presents a major challenge in precision medicine, requiring advanced computational methods for accurate disease classification and biological interpretation. This study introduces the Multi-Omics Graph Kolmogorov-Arnold Network (MOGKAN), a deep learning model that integrates messenger RNA, micro RNA sequences, and DNA methylation data with Protein-Protein Interaction (PPI) networks for accurate and interpretable cancer classification across 31 cancer types. MOGKAN employs a hybrid approach combining differential expression with DESeq2, Linear Models for Microarray (LIMMA), and Least Absolute Shrinkage and Selection Operator (LASSO) regression to reduce multi-omics data dimensionality while preserving relevant biological features. The model architecture is based on the Kolmogorov-Arnold theorem principle, using trainable univariate functions to enhance interpretability and feature analysis. MOGKAN achieves classification accuracy of 96.28 percent and demonstrates low experimental variability with a standard deviation that is reduced by 1.58 to 7.30 percents compared to Convolutional Neural Networks (CNNs) and Graph Neural Networks (GNNs). The biomarkers identified by MOGKAN have been validated as cancer-related markers through Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis. The proposed model presents an ability to uncover molecular oncogenesis mechanisms by detecting phosphoinositide-binding substances and regulating sphingolipid cellular processes. By integrating multi-omics data with graph-based deep learning, our proposed approach demonstrates superior predictive performance and interpretability that has the potential to enhance the translation of complex multi-omics data into clinically actionable cancer diagnostics.

artificial intelligence, kolmogorov, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2503.22939

Country:

Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.24)
North America > United States > Idaho > Latah County > Moscow (0.04)
North America > United States > Arkansas > Benton County > Bentonville (0.04)
(4 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Carcinoma (1.00)
Health & Medicine > Therapeutic Area > Gastroenterology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

FedRBE -- a decentralized privacy-preserving federated batch effect correction tool for omics data based on limma

Burankova, Yuliya, Klemm, Julian, Lohmann, Jens J. G., Taheri, Ahmad, Probul, Niklas, Baumbach, Jan, Zolotareva, Olga

arXiv.org Artificial IntelligenceDec-8-2024

Batch effects in omics data obscure true biological signals and constitute a major challenge for privacy-preserving analyses of distributed patient data. Existing batch effect correction methods either require data centralization, which may easily conflict with privacy requirements, or lack support for missing values and automated workflows. To bridge this gap, we developed fedRBE, a federated implementation of limma's removeBatchEffect method. We implemented it as an app for the FeatureCloud platform. Unlike its existing analogs, fedRBE effectively handles data with missing values and offers an automated, user-friendly online user interface ( https://featurecloud.ai/app/fedrbe). Leveraging secure multi-party computation provides enhanced security guarantees over classical federated learning approaches. We evaluated our fedRBE algorithm on simulated and real omics data, achieving performance comparable to the centralized method with negligible differences (no greater than 3.6E-13). By enabling collaborative correction without data sharing, fedRBE facilitates large-scale omics studies where batch effect correction is crucial.

correction, dataset, removebatcheffect, (13 more...)

arXiv.org Artificial Intelligence

2412.05894

Country:

North America > United States > California (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > China (0.04)
(6 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
(2 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Biomedical Informatics (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

SGUQ: Staged Graph Convolution Neural Network for Alzheimer's Disease Diagnosis using Multi-Omics Data

Tao, Liang, Xie, Yixin, Deng, Jeffrey D, Shen, Hui, Deng, Hong-Wen, Zhou, Weihua, Zhao, Chen

arXiv.org Artificial IntelligenceOct-14-2024

Alzheimer's disease (AD) is a chronic neurodegenerative disorder and the leading cause of dementia, significantly impacting cost, mortality, and burden worldwide. The advent of high-throughput omics technologies, such as genomics, transcriptomics, proteomics, and epigenomics, has revolutionized the molecular understanding of AD. Conventional AI approaches typically require the completion of all omics data at the outset to achieve optimal AD diagnosis, which are inefficient and may be unnecessary. To reduce the clinical cost and improve the accuracy of AD diagnosis using multi-omics data, we propose a novel staged graph convolutional network with uncertainty quantification (SGUQ). SGUQ begins with mRNA and progressively incorporates DNA methylation and miRNA data only when necessary, reducing overall costs and exposure to harmful tests. Experimental results indicate that 46.23% of the samples can be reliably predicted using only single-modal omics data (mRNA), while an additional 16.04% of the samples can achieve reliable predictions when combining two omics data types (mRNA + DNA methylation). In addition, the proposed staged SGUQ achieved an accuracy of 0.858 on ROSMAP dataset, which outperformed existing methods significantly. The proposed SGUQ can not only be applied to AD diagnosis using multi-omics data but also has the potential for clinical decision-making using multi-viewed data. Our implementation is publicly available at https://github.com/chenzhao2023/multiomicsuncertainty.

artificial intelligence, machine learning, prediction, (17 more...)

arXiv.org Artificial Intelligence

2410.11046

Country:

North America > United States > Georgia > Cobb County > Marietta (0.04)
North America > United States > Michigan (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

PRAGA: Prototype-aware Graph Adaptive Aggregation for Spatial Multi-modal Omics Analysis

Huang, Xinlei, Ma, Zhiqi, Meng, Dian, Liu, Yanran, Ruan, Shiwei, Sun, Qingqiang, Zheng, Xubin, Qiao, Ziyue

arXiv.org Artificial IntelligenceSep-19-2024

Spatial multi-modal omics technology, highlighted by Nature Methods as an advanced biological technique in 2023, plays a critical role in resolving biological regulatory processes with spatial context. Recently, graph neural networks based on K-nearest neighbor (KNN) graphs have gained prominence in spatial multi-modal omics methods due to their ability to model semantic relations between sequencing spots. However, the fixed KNN graph fails to capture the latent semantic relations hidden by the inevitable data perturbations during the biological sequencing process, resulting in the loss of semantic information. In addition, the common lack of spot annotation and class number priors in practice further hinders the optimization of spatial multi-modal omics models. Here, we propose a novel spatial multi-modal omics resolved framework, termed PRototype-Aware Graph Adaptative Aggregation for Spatial Multi-modal Omics Analysis (PRAGA). PRAGA constructs a dynamic graph to capture latent semantic relations and comprehensively integrate spatial information and feature semantics. The learnable graph structure can also denoise perturbations by learning cross-modal knowledge. Moreover, a dynamic prototype contrastive learning is proposed based on the dynamic adaptability of Bayesian Gaussian Mixture Models to optimize the multi-modal omics representations for unknown biological priors. Quantitative and qualitative experiments on simulated and real datasets with 7 competing methods demonstrate the superior performance of PRAGA.

graph, relation, semantic relation, (15 more...)

arXiv.org Artificial Intelligence

2409.12728

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.56)

Add feedback

CMOB: Large-Scale Cancer Multi-Omics Benchmark with Open Datasets, Tasks, and Baselines

Yang, Ziwei, Kotoge, Rikuto, Chen, Zheng, Piao, Xihao, Matsubara, Yasuko, Sakurai, Yasushi

arXiv.org Artificial IntelligenceSep-2-2024

Machine learning has shown great potential in the field of cancer multi-omics studies, offering incredible opportunities for advancing precision medicine. However, the challenges associated with dataset curation and task formulation pose significant hurdles, especially for researchers lacking a biomedical background. Here, we introduce the CMOB, the first large-scale cancer multi-omics benchmark integrates the TCGA platform, making data resources accessible and usable for machine learning researchers without significant preparation and expertise.To date, CMOB includes a collection of 20 cancer multi-omics datasets covering 32 cancers, accompanied by a systematic data processing pipeline. CMOB provides well-processed dataset versions to support 20 meaningful tasks in four studies, with a collection of benchmarks. We also integrate CMOB with two complementary resources and various biological tools to explore broader research avenues.All resources are open-accessible with user-friendly and compatible integration scripts that enable non-experts to easily incorporate this complementary information for various tasks. We conduct extensive experiments on selected datasets to offer recommendations on suitable machine learning baselines for specific applications. Through CMOB, we aim to facilitate algorithmic advances and hasten the development, validation, and clinical translation of machine-learning models for personalized cancer treatments. CMOB is available on GitHub (\url{https://github.com/chenzRG/Cancer-Multi-Omics-Benchmark}).

cancer, dataset, multi-omic data, (14 more...)

arXiv.org Artificial Intelligence

2409.02143

Country:

Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.05)
North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Carcinoma (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback